The concept of the Markov chain of order L, which we essentially owe to the Russian mathematician Andrej Andreevic Markov (1907), has two drawbacks. First, the number of parameters of the model grows exponentially with the order L of the chain. This brings about computational and storage problems during implementation, including for limited memory length L.
An improvement initially put forward by (Rissanen - 1983) and used particularly in compression data (Weinberger - 1992, Willems - 1995) was the Variable Length Markov chain (Buhlmann - 1999). This model can be represented by a tree, known as Prediction Suffix Tree – PST (Ron - 1996), certain branches of which are depth L and others of an inferior depth to L, whereas the Markov chain of order L corresponds to a complete tree of depth L. By reducing the storage cost, pruning the branches of the tree will enable us to increase the order of the model and, thereby improve performance.